Diagnostic biomarker panels of osteoarthritis: UPLC-QToF/MS-based serum metabolic profiling

Osteoarthritis (OA) is the most common joint disease in the world, characterized by pain and loss of joint function, which has led to a serious reduction in the quality of patients’ lives. In this work, ultrahigh performance liquid chromatography coupled with quadrupole time-of-flight tandem mass spectrometry (UPLC-QToF/MS) in conjunction with multivariate pattern recognition methods and an univariate statistical analysis scheme were applied to explore the serum metabolic signatures within OA group (n = 31), HC (healthy controls) group (n = 57) and non-OA group (n = 19) for early diagnosis and differential diagnosis of OA. Based on logistic regression analysis and receiver operating characteristic (ROC) curve analysis, seven metabolites, including phosphatidylcholine (18:0/22:6), p-cresol sulfate and so on, were identified as critical metabolites for the diagnosis of OA and HC and yielded an area under the curve (AUC) of 0.978. The other panel of unknown m/z 239.091, phosphatidylcholine (18:0/18:0) and phenylalanine were found to distinguish OA from non-OA and achieved an AUC of 0.888. These potential biomarkers are mainly involved in lipid metabolism, glucose metabolism and amino acid metabolism. It is expected to reveal new insight into OA pathogenesis from changed metabolic pathways.


INTRODUCTION
Osteoarthritis (OA), whose pathological hallmarks are the loss of articular cartilage, the hypertrophic differentiation of chondrocytes, subchondral bone thickening, synovial inflammation and osteophyte formation (Li et al., 2018), which manifest as joint stiffness, pain and dysfunction, is the most prevalent form of arthritis and a major source of joint pain and disability (Mathiessen & Conaghan, 2017). There are multiple factors thought to be associated with OA, including aging, previous joint injury, obesity, genetics, sex and anatomical factors, however, the exact etiological mechanism has not yet been fully elucidated (Loeser, Collins & Diekman, 2016;Prieto-Alhambra et al., 2014). Non-steroidal anti-inflammatory drugs and intra-articular steroid injections are usually used to relieve the pain and inflammation of osteoarthritis clinically, but their efficacy is limited and toxicity is also great (Mathiessen & Conaghan, 2017). OA has affected over 250 million individuals worldwide (Carlson et al., 2019) and emerging evidence suggested that this number would steadily rise, which not only impacts people's quality of life but also imposes a substantial socioeconomic burden (Mathiessen & Conaghan, 2017;Zhai, Randell & Rahman, 2018).
At present, early diagnosis is paramount but difficult for OA. The radiography and clinical manifestations are the main diagnostic methods. However, OA is diagnosed at the onset of clinical symptoms typically occurs when structural damage has been ineluctable (Carlson et al., 2018). More seriously, distinguishing OA from other arthritic diseases has become challenging because of the similar clinical symptoms and similar pathological features (Jiang et al., 2013). Common manifestations of arthritic diseases are inflammation, functional degradation of connective tissues, along with pain and stiffness (Shim et al., 2018). These difficulties not only bring about a great confusion in diagnosis but also delay the use of optimal therapy. Taken together, there are still a lack of reliable markers with high sensitivity and specificity for the early and accurate diagnosis of OA, which highlight the desire to discover specific biomarkers.
Metabolomics is an emerging field which reflects the metabolic response of living systems to pathophysiological stimuli or environmental conditions (Zeng et al., 2017), and focuses on the low-molecular-weight metabolites to detect potential diagnostic biomarkers. Currently, nuclear magnetic resonance spectroscopy and mass spectrometry (MS) coupled with gas chromatography or liquid chromatography (LC) are the most common analytical tools in detection of metabolites and have been widely used in various clinical metabolomics studies (Hackshaw et al., 2019;Yang et al., 2018a). Among these techniques, LC shows satisfactory complex matrix separation and MS displays high sensitivity, resolution and reproducibility. And LC coupled with MS allows to attain the most comprehensive coverage of metabolic features (Han et al., 2019), which is more suitable for metabolite screening. So far, this well-established platform has been widely applied in many disease researches such as ovarian cancer, acute myeloid leukemia and breast cancer (Shao et al., 2016;Wang et al., 2019;Yang et al., 2018b). In addition, serum could be an ideal biomedium for metabolic profile study since it is relatively common, more accessible, more stable and less invasive than other biosamples. Compared with whole blood and plasma, serum components are unaffected by cellular components and hemolytic factors. In contrast to urine, serum is insusceptible to dietary factors and sampling time. Moreover, the storage conditions of serum are relatively simple, which can not only be stored at room temperature for a short time, but also have no great impact on most tests when stored at −80 C for a long time. More importantly, serum can provide a snapshot of metabolic dynamics and intuitively reflect the alters in endogenous marker concentrations, helping to excavate deeper into the pathogenesis of the diseases (Yang et al., 2017). Therefore, serum metabolomics is still the mainstream tool for biomarker discovery in many clinical researches (Dong et al., 2015;Khan et al., 2019;Pan et al., 2019).
Considering that systemic metabolic dys-regulation has taken place in the pathogenesis of OA (June et al., 2016), if the molecular metabolites could be identified before irreversible degeneration happened, it would be helpful to make decisions in clinical diagnosis to delay OA progression and minimize the negative effects in society. Hence, ultrahigh performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (UPLC-QToF/MS) combined with univariate and multivariate statistical analysis was carried out to identify a distinct serum metabolic signature to robustly discriminate OA patients from healthy controls (HC), as well as non-osteoarthritis (non-OA) cohort. The primary objective of this study was to explore the differentially expressed serum metabolites of OA which could be helpful for diagnosis at an early stage. Besides that, we aimed to better understand the dysfunction mechanism of OA at the metabolic level.

Study design
Ethics approval was obtained from the Ethics Committee of Fujian Medical University (No. 2019[34]). In this study, a total of 107 participants were recruited in 2019 at the First Affiliated Hospital of Fujian Medical University. Written informed consent signed by each of participants was obtained prior to blood samples were taken. Based on the international consensus diagnostic criteria, 31 patients with OA as OA group and 19 patients with different forms of arthritis as non-OA group, including 11 rheumatoid arthritis, three ankylosing spondylitis, two gout, one psoriatic arthritis, one septic arthritis and one psoriasis, were enrolled. And healthy control (HC) group was constituted by 57 healthy volunteers with no declared history of arthritis. The general characteristics of the subjects in each group were summarized in Table 1 and all groups were matched by age and gender.
All serum samples were collected by venipuncture and were immediately separated by centrifuging at a speed of 3,000 g for 10 min at 4 C. Fractions (200 mL) of the serum supernatants were quickly stored at −80 C until the UPLC-QToF/MS analysis was conducted.

Chemicals and reagents
High performance liquid chromatography (HPLC) grade methanol and acetonitrile were obtained from Merck (Darmstadt, Germany). Analytical grade formic acid and . L-2-chlorophenylalanine and standards for metabolite identification were bought from Shanghai Aladdin Bio-Chem Technology Co., LTD. A Milli-Q purification system (Bedford, MA, USA) was used to provide ultra-pure water. A total of 2.0-mL vials were purchased from Agilent (Palo Alto, CA, USA).

Serum sample preparation
Before analysis, each serum sample was thawed at 4 C and immediately centrifuged at 20,000 g for 10 min. Subsequently, each 100 mL aliquot of serum was added with 400 mL extraction solvents, namely methanol-acetonitrile (50:50, v/v) containing L-2chlorophenylalanine (0.6 mg/ml) as internal standard (IS) in a 2.0-ml Eppendorf (EP) tube. The mixture was vigorously shaken for 30 s, refrigerated at −20 C for 20 min and then centrifuged at 20,000 g for 15 min. Afterwards, an aliquot of 200 mL of supernatant was transferred to a 1.5-ml EP tube prior to centrifuging at 30,000 g for 5 min. Finally, 150 mL of supernatant was transferred into the 2.0-mL vial for UPLC-QToF/MS analysis. To ensure consistent condition and reliability of this analytical system, quality control (QC) sample was obtained by mixing equal aliquots (10 mL) of each individual sample and pretreated as serum sample preparation (Acera et al., 2019). The pooled QC sample was injected six times at the start of the analytical batch to balance the column, and once after every 10 injections of serum samples throughout the analytical workflow. Therefore, QC sample was analyzed for a total of 17 times during the whole analysis process.
The mass spectrometer was operated in both ESI+ and ESI−, and the mass range was set at 50-1,000 mass-to-charge ratio (m/z) to acquire the data. The optimal capillary voltage was set at 3.0 kV (ESI+) or 2.5 kV (ESI−), with sample cone voltage at 35 V (ESI+) or 40 V (ESI−). The source temperature was set at 120 C (ESI+) or 100 C (ESI−) and desolvation temperature was set at 450 C. The gas flow for cone and desolvation were set to 50 and 650 L/h, respectively. In an attempt to calibrate mass accurately and monitor the signal in real time, LE was used as the reference compound (m/z 556.2771 in ESI+ and 554.2615 in ESI−) at a concentration of 1 mg/ml under a flow rate of 10 ml/min, and the lock spray frequency was set at 10 s for real-time accurate mass correction. The data were collected in centroid data mode with a scan time of 0.5 s. The MS E mode was applied for the acquisition of the MS/MS spectra of representative fragments, where two acquisition functions with different collision energies were established including low collision energy of 20 V and the high collision energy of 30 V.

Data analysis
The raw data from UPLC-QToF/MS were initially converted in MarkerLynx Applications Manager version 4.1 (Waters, Milford, MA, USA) for peak finding, filtering, and alignment. The main set-up parameters (Sun et al., 2016) were as follows: retention time (RT) range 0-14 min; mass range 50-1,000; extracted ion chromatograms (XIC) window 0.02 Da; RT window 0.2 min; mass window 0.02 Da; marker intensity threshold 2,500 counts; noise elimination level 6. And then the three-dimensional matrix data were generated, consisting of the sample name, the RT and m/z pair, and the ions peak areas corrected with LE. The observed m/z of every compound was corrected online. The data were subsequently exported to Microsoft Office Excel 2007 for IS peak area normalization and for removing the missing values by the 80% rule (Dong et al., 2015).
Thereafter, the pretreated data were transferred to Simca-P software (version 14.1; Umetrics AB, Malmö, Sweden) (Tu et al., 2021) which is mainly used for the statistical methods of principle component analysis (PCA) and partial least square (PLS) regression. PCA (Trygg, Holmes & Lundstedt, 2007), an unsupervised method, was initially applied to acquire an unbiased overview of the entire samples. Supervised model was subsequently carried out by orthogonal partial least squares-discriminant analysis (OPLS-DA), which is mainly focused on maximizing the distance between groups and has the capability to identify changed endogenous molecules (Gao et al., 2012). Pareto scaling was performed to center and scale the variance (Yang et al., 2018a). The quality of the fitting model was evaluated by parameter R 2 Y that was used to explain the percentages of y-variables, and parameter Q 2 that represents the capacity of predictive value . Moreover, in order to guard against model over-fitting and acquire higher data fidelity for biomarker screening, permutation tests were performed 100 times. The metabolites that contributed to the classification were obtained according to variable importance in the projection (VIP) values from the OPLS-DA model which describes the overall contribution of each variable to the model, and those metabolites with higher VIP values represented more significant influence on differentiation among groups . In this study, the criterion of VIP value was artificially set to four to narrow down the candidates' range and improve the accuracy of screening for potential biomarkers.
To further evaluate the differences of preselected metabolites between HC/OA/non-OA groups, the nonparametric univariate statistical analysis approach without requirement for sample normal distribution, Kruskal-Wallis test was executed by Statistical Product and Service Solutions (SPSS) 17.0 (IBM Corp., Armonk, NY, USA) (Gündüzöz et al., 2017). This statistical significance evaluation was conducted by comparison their normalized peak area between OA and HC/non-OA groups. Metabolite features with a p-value less than 0.05 were considered statistically significant in these analyses. Therefore, the range of the potential biomarkers of interest from preselected metabolites has been further narrowed.
To further ascertain whether those potential biomarkers have the diagnostic ability, receiver operating characteristic (ROC) curve was plotted and the area under the curve (AUC) was also computed via numerical integration of the curve by SPSS 17.0 software. Markers with AUC value greater than 0.8 had superior diagnostic performance (Alizadeh-Sedigh et al., 2022;Choe et al., 2022;Decraecker et al., 2022;Saleem et al., 2017). In addition, logistic regression analysis was simultaneously carried out to integrate these potential biomarkers. And the optimal AUC, sensitivity and specificity were determined using the maximum of the Youden index, which calculated as follows: Youden's index = sensitivity + specificity − 1 (Alfitian et al., 2022;Hatakenaka et al., 2022;Marty et al., 2013).
For estimation of disturbed pathways subject to OA, the metabolic pathway analysis was executed by Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/ kegg/) (Lin et al., 2021) based on the potential biomarkers from this study.

Data quality assessment
The stability of this analytical run was preliminarily assessed by overlapping the chromatographic peaks from the QC results. As illustrated in Fig. 1, the base peak intensity (BPI) chromatograms were well overlapped, indicating a good reliability of this method.
Furthermore, the RT and normalized peak areas of five XIC peaks in two ionization modes from 17 injections of the QC sample were concerned. The fluctuation ranges of RT and normalized peak areas of these five ions were then calculated and shown by relative standard deviations (RSDs) in Table 2. The RSDs ranges of RT and normalized peak areas were 0.00-1.16% and 4.92-9.95%, respectively. In two ionization modes, the reproducibilities of RT and normalized peak areas were acceptable and collectively highlighted the robustness of this metabolomic platform.

Overall serum metabolomic analysis
Each serum sample was analyzed by untargeted metabolomics-based UPLC-QToF/MS, and the typical BPI chromatograms from three groups (OA, non-OA and HC) from both ESI+ and ESI− were presented in Fig. 2. A total of 4,577 (ESI+) and 5,804 (ESI−) metabolic features in the current data were obtained from UPLC-QToF/MS analysis. The multivariate statistical analyses were introduced to build metabolic profiling of OA, HC and non-OA groups. Preliminarily, based on the metabolite spectra of three groups samples, namely OA vs HC vs non-OA, the PCA 3-dimensional (3D) score plots in ESI+ and ESI− were obtained and depicted in Figs. 3A and 3B, respectively. However, distinct discrimination within groups could not be observed.
In an attempt to get better distinct discrimination, supervised OPLS-DA models were then applied. The OPLS-DA 3D score plots of two ionization modes were illustrated in Figs. 3C and 3D, presenting a more explicit group classification. The explained variation parameter R 2 Y and the cross-validation parameter Q 2 of OPLS-DA models were 0.918 and 0.580 in ESI+, and 0.586 and 0.372 in ESI−, displaying a satisfactory separating tendency. These results implied that OA patients had the characteristics of serum metabolism and it is feasible to seek underlying biomarkers.

OA-related metabolites screening
Metabolites for OA diagnosis To figure out the metabolites differentiating OA patients from control subjects, the supervised statistical model based on OA and HC groups was conducted. As shown in Fig. 4, a distinct separation between the two groups can be observed with R 2 Y and Q 2 values were 0.961 and 0.703 in ESI+ (Fig. 4A), and 0.838 and 0.590 in ESI− (Fig. 4B), suggesting metabolites have been surely changed in OA patients. In the meantime, permutation tests with 100 random iterations were conducted for the purpose of model validation. As presented in Figs. 4C and 4D, their corresponding permutation results were encouraging as all permuted R 2 and Q 2 values were lower than the original points to the right and that the intercept of Q 2 was below zero, implying the feasibility of the constructed model (Yang et al., 2018b).

Metabolites for differential diagnosis
In order to pick out the metabolites which could be used for differentiating OA patients from non-OA subjects, another pair-wise groups comparison based on OA and non-OA samples was executed. The OPLS-DA 3D score plots were illustrated in Figs. 5A and 5B, still displaying a favorable separation between the two groups. The model parameters values were desirable as follows: R 2 Y = 0.906 and Q 2 = 0.396; R 2 Y = 0.921 and Q 2 = 0.396, in ESI+ and ESI−, respectively. Thus, it is speculated that there was presence of differential biochemical changes in OA patients when compared with the patients with other types of arthritis. Simultaneously, the corresponding permutation tests results of the models were depicted in Figs. 5C and 5D. All permuted R 2 and Q 2 values were lower than the original points to the right and that the intercept of Q 2 was below zero, indicating the above OPLS-DA models were non-overfitting and valid. In addition, according to VIP value > 4 and p value < 0.05, a total of 15 differentially regulated metabolites were retained and their detailed information were listed in Table 3, including nine members of PC species, unknown m/z 239.091, unknown m/z 195.101, PFOS, docosahexaenoic acid (DHA), phenylalanine (Phe) and indoxyl sulfate.

Optimization of potential biomarkers
The aforesaid results suggested that the differential biochemical metabolites could be used as unique diagnostic biomarker. ROC curves were established based on the OA-related metabolites to test single metabolite' diagnostic effectiveness, and their corresponding AUC values were illustrated in Table 3. Every metabolic candidate had an AUC value which exceeded 0.6. To better understand how multiple metabolites collectively distinguish the OA and HC/non-OA groups, stepwise logistic regression models were built based on above 14/15 candidates, respectively. A group of seven candidates were picked out according to p < 0.05 in the logistic regression to diagnose OA from HC groups, namely unknown m/z 239.091 (standardized (  0.888, respectively, yielding greater diagnostic value than most of the single candidate whose AUC value were lower than 0.8 (Table 3).

OA-related metabolic pathways
Our results have reflected distinctively differential metabolites occurred in OA, indicating that metabolic biological networks may be subjected. And further metabolic pathway analysis is needed to better clarify the holistic status of the altered metabolic markers and explore possible disturbed pathways. The metabolic pathway analysis of the above potential biomarkers with KEGG database exhibited that OA-induced dys-regulated pathways involved in arachidonic acid metabolism; glycerophospholipid metabolism; de novo lipogenesis pathway; tyrosine and Phe metabolism and glycolysis (Fig. 7).

DISCUSSION
OA is a leading cause of disability worldwide and its early and accurate diagnosis is still hampered due to the lack of sensitive and specific biomarkers. Many studies have been demonstrated that metabolic components have been associated with OA genesis and development (Batushansky et al., 2019;de Sousa et al., 2017). And metabolomics could serve as a vital intermediate tool between basic and clinical research whereby providing potential biomarkers (Atanassova, Panchev & Ivanova, 2010;Spratlin, Serkova & Eckhardt, 2009). In view of this, a global serum unbiased metabolomic analysis using an UPLC-QToF/MS platform was performed in the present study. In our previous study, the combination of reverse phase liquid chromatography (RPLC) and hydrophilic interaction liquid chromatography (HILIC) columns can indeed increase the number of metabolites detected in serum metabolomics study (Gao et al., 2015). However, from the view of simplicity and cost, it is not convenient and practical to use both RPLC and HILIC columns for disease screening and diagnosis in a hospital laboratory. Hereby, RPLC-QToF/MS-based metabolomic workflow was employed just like some other researches that employed a single reverse phase column to separate metabolites in serum and urine samples Zhang et al., 2022). In our study, 20 serum metabolites were found to be related to OA. Among them, the significant reduction of serum 12-HETE was found in OA group as in comparison with HC group. And the serum 12-HETE displayed favorable effectiveness to identify OA from HC as its AUC, sensitivity, specificity and Youden index were 0.776, 84.2%, 58.1% and 0.423, respectively. This ROC analysis result suggested that the serum 12-HETE could be a potential diagnostic maker of OA. Moreover, our results also exhibited that OA patients had markedly higher expression level of PC (18:0/22:6) when compared with non-OA subjects. And the serum PC (18:0/22:6) produced an AUC of 0.761, sensitivity of 74.2%, specificity of 73.7% and Youden index of 0.479, indicating that it may be an important differential diagnostic feature of OA. Except that, to make it better to apply the OA-related metabolites to predict OA for clinical application, we chose and integrated several potential biomarkers for establishing models. Ultimately, the analysis results of ROC curve in combination with logistic regression revealed that unknown m/z 239.091, PC (18:0/22:6), PC (18:0/18:0), p-CS, unknown m/z 195.101, 12-HETE and CVA could be selected as candidate biomarkers to diagnose OA from HC groups. This seven-metabolite panel was identified with an AUC of 0.978, sensitivity of 96.7%, specificity of 93.0% and Youden index of 0.898, which displayed better diagnostic capability than 12-HETE. Furthermore, another three-metabolite panel of unknown m/z 239.091, PC (18:0/18:0) and Phe displayed an AUC of 0.888, sensitivity of 89.5%, specificity of 80.6% and Youden index of 0.791 to discriminate OA from non-OA group, which also showed greater differential diagnostic value than PC (18:0/22:6). Summing up, these two panels might be an excellent performing indicator with respect to OA and they could be useful in laboratory medicine.
According to the metabolic pathway analyses, the results displayed that 12-HETE, the long-chain polyunsaturated fatty acids (LC-FUFAs), is originated by arachidonic acid and is formed via 12-lipoxygenase-mediated lipoxygenase oxygenation to participate in arachidonic acid metabolism (Charles-Schoeman et al., 2018). PC transforms into arachidonic acid and acetaldehyde, which not only involve in arachidonic acid metabolism, but also in glycerophospholipid metabolism. CVA, a kind of monounsaturated fatty acid (MUFA), was converted by carbohydrates, proteins and acetyl coenzyme A via an endogenous pathway called de novo lipogenesis (DNL) (Djoussé et al., 2012;Wu et al., 2011). Phe, the essential aromatic amino acid, can produce tyrosine by hydroxylation in Phe metabolism (Dobrian et al., 2011). And p-CS is the degradation product of Phe and tyrosine by intestinal epithelial cell sulfotransferase, involving in the tyrosine and Phe metabolism (Peng et al., 2019) (Fig. 7).
Of course, several potential limitations in this study should not be neglected. Firstly, two unknown features with m/z 195.101 and m/z 239.091 were found in this study. These unknown identities hindered the understanding of their biological roles in OA. Secondly, sample capacity in OA and non-OA groups are not large enough to strongly validate these potential biomarkers. Thus, in order to evaluate the predictive ability of the potential biomarkers, blinded tests should be conducted for prediction and validation. At the same time, further clinical tests with a larger sample capacity of subjects and in vitro and in vivo experiments are required to confirm our aforesaid findings and hypothesis.
In conclusion, this work revealed the feasibility of UPLC-QToF/MS-based serum untargeted metabolomics to capture the metabolic signature and screen the potential biomarkers of OA. It demonstrated that OA-induced metabolic disturbance involved in lipid metabolism, glucose metabolism and amino acid metabolism, which may point towards potential contributing mechanisms in OA pathogenesis and progression. Moreover, it should be noted that the single metabolite has a certain advantage in OA early and accurate diagnosis with AUC greater than 0.7, but the integrated panel yielded a greater diagnostic value than that of each single one. Thus, the above two metabolic panels toward clinical practice for OA diagnosis may be valuable in the future.

Grant Disclosures
The following grant information was disclosed by the authors: National Natural Science Foundation of China: 21405017. Yao Gao conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Human Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): Ethics approval was obtained from the Ethics Committee of Fujian Medical University (No. 2019[34]).

Data Availability
The following information was supplied regarding data availability: The raw measurements are available as a Supplemental File.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.14563#supplemental-information.